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ABSTRACT 

CircaDB (http://circadb.org) is a new database of 
circadian transcriptional profiles from time course 
expression experiments from mice and humans. 
Each transcript's expression was evaluated by 
three separate algorithms, JTK_Cycle, Lomb 
Scargle and DeLichtenberg. Users can query the 
gene annotations using simple and powerful full 
text search terms, restrict results to specific data 
sets and provide probability thresholds for each al- 
gorithm. Visualizations of the data are intuitive 
charts that convey profile information more effect- 
ively than a table of probabilities. The CircaDB web 
application is open source and available at http:// 
github.com/itmat/circadb. 

INTRODUCTION 

Circadian rhythms are biological rhythms of ~24h in 
many physiological and behavioral processes (1,2). These 
rhythms are generated by a cell autonomous circadian 
clock, present in most cells in mammals. This circadian 
clock is composed of interlocked transcriptional, transla- 
tional feedback loops, where transactivators activate 
repressors that later feedback on the activators (3). 
Components of the required E-box loop include Bmall, 
Bmal2, Clock and Npas2, bHLH-PAS transactivators, 
Perl, Per2 and Per3, PAS domain containing repressors 
and Cryl and Cry2 (4), transcriptional repressors related 
to cryptochromes from plants and insects. An important 
secondary loop also exists, the ROR loop, which com- 
prises Rev-erb-alpha, Rev-erb-beta, transcriptional repres- 
sors, as well as Rora, Rorb and Rory, transcriptional 
activators (5-7). Factors in this loop regulate transcript 
levels of several of the E-box components including 
Bmall, Cryl, Npas2 and Per2. The cAMP Responsive 
Element Binding Protein (CREB) pathway (8,9) and 
D-box binding factors, Dbp, Hlf, Tef, Nfil3, also regulate 



clock function (10,11). Thus, transcription factors play a 
major role in the functioning of the core clock. 

In addition to regulating transcription of each other, 
clock factors also impart circadian rhythms in expression 
of many 'output' genes. First order clock control genes are 
those directly regulated by clock factors (e.g. 
Clock/Bmall), while second order output genes could be 
regulated by a first-order clock-control gene, but not clock 
components (12-14). Because of this, the research commu- 
nity has spent more than a decade cataloging genes under 
clock control (12,13,15-17). Historically, these include 
many disease genes, drug targets and important compo- 
nents of various biological pathways (1,18-20). For 
example, HMG-CoA reductase, the rate limiting enzyme 
of cholesterol biosynthesis and target of statins, is under 
clock control in liver (21). Several factors have catalysed a 
more complete description of circadian rhythms, including 
the advent of DNA arrays (16) and now RNA sequencing 
(22), powerful statistical approaches to find rhythmic 
genes (23) and appropriate experimental design. 

The goal of CircaDB is to systematically collect, analyse 
and visualize circadian expression profiles for bench 
researchers in a simple and straightforward fashion. 
Common queries are supported and include straightfor- 
ward queries of expression profiles, as well as compound 
queries searching keywords in the gene annotation, in 
multiple tissues, with the ability to restrict results by prob- 
ability of cycling. 



MATERIALS AND METHODS 

Various publicly available microarray time course studies 
(23-26) were collected (Table 1). References and links to 
download the expression data sets are outlined on the web- 
site. Data from each study were re-analysed using three 
circadian rhythm detection algorithms: JTK_CYCLE, 
Lombe Scargle, de Lichtenberg (23,27,28). Table 2 lists 
the runtime parameters of the algorithms on each data 
set. The reported expression values from each study 
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Table 1. Expresssion data sets in CircaDB 



Name 


Time points 


Species/tissue 


Panda 2002 


12 


Mouse suprachiasmatic nuclei (SCN) of the hypothalamus, and liver 


Hughes 2009 


48 


Mouse liver, NIH3T3 cells, pituitary gland and human U20S cells 


Miller 2007 and Andrews 2010 


12 (WT) 


Wild type mouse liver, SCN and skeletal muscle 




7 (KO) 


Clock mutant mouse liver, SCN and skeletal muscle 


Rudic 2004 


12 


Mouse aorta, kidney 



Table 2. Runtime parameters for each data set and algorithm 



Data set 


JTK_CYCLE 


Lomb Scargle 


De Lichtenberg 




Panda 2002 


Periods: 


16-32 h 


minFrequency = 1/32, maxFrequncy = 1/18; 
(periods = 18-32h; #test frequencies: 4*N 


Period = 24 h 
#Permutations = 


10000 


Hughes 2009 (mouse) 


Periods: 


6-42 h 


minFrequency = 1/6, maxFrequncy = 1/42; 
(periods = 6-42 h; #test frequencies: 4*N 


Period = 24 h 










#Permutations = 


10000 


Hughes 2009 (human) 


Periods: 


6-42 h 


minFrequency = 1/6, maxFrequncy = 1/42; 
(periods = 6-42 h; #test frequencies: 4*N 


Period = 24 h 
#Permutations = 


10000 


Miller 2007 


Periods: 


16-32 h 


minFrequency = 1/32, maxFrequncy = 1/18; 
(periods = 18-32h; #test frequencies: 4*N 


Period = 24 h 
#Permutations = 


10000 


Andrews 2010 


Periods: 


20-28 h 


minFrequency = 1/6, maxFrequncy = 1/42; 
(periods = 6-42 h; #test frequencies: 4*N 


Period = 24 h 










#Permutations = 


10000 


Rudic 2004 


Periods: 


16-32 h 


minFrequency = 1/32, maxFrequncy = 1/18; 
(periods = 18-32h; #test frequencies: 4*N 


Period = 24 h 










#Permutations = 


10000 



Data sets are located in Table 1. 

N = number of time points in the series. 



were not filtered, as each algorithm accounts for 
technical replicates. The significance calls and other 
results reported by each algorithm were entered into a 
MySQL database. 

Gene annotation data were downloaded from the 
Affymetrix NetAffx resource (http://www.affymetrix. 
com/analysis/index. affx). Annotations were then entered 
into the database alongside the unfiltered experimental 
values and the results of the circadian rhythm detection 
algorithms. Transcript information was supplemented 
with links to the GeneWiki project (29,30) and 
Homologene (http://www.ncbi.nlm.nih.gov/homologene). 
The data model for the database is described in Figure 1. 

The transcript annotation and the statistical results were 
indexed with the Sphinx full text search system (http:// 
sphinxsearch.com/). Visualization of data is accomplished 
by created using pre-formatted URI requests to 
the Google Charts API (https://developers.google.com/ 
chart/). The web application was coded using the Ruby 
on Rails framework (http://rubyonrails.org/). 

All source code for data loading and the web applica- 
tion is licensed under the GNU General Public License 
(GPL-2.0) license and available at http://github.com/ 
itmat/circadb. 



RESULTS AND DISCUSSION 

In creating CircaDB, we have provided the research com- 
munity a clear, concise and powerful interface for 
querying genes within the context of circadian expression 
profile data. Another circadian expression database, 



Diurnal 2.0 (31), provides a similar resource to CircaDB 
but focuses on plant data. It also restricts its initial search 
to transcript accessions, whereas CircaDB allows full 
query capabilities on gene annotation. CircaDB provides 
advanced keyword search capabilities of gene annotation. 
This includes the ability to search by phrases, boolean 
conditions and combinations thereof. Queries can also 
be restricted by a given experiment's data set, phase of 
expression and significance of a particular algorithm 
(Figure 2). 

The Database of Circadian Gene Expression (24), 
part of the Gene Atlas Project (32), contains a subset of 
the same data sets in CircaDB, but uses a single circadian 
expression algorithm. CircaDB contains all of these 
data and re-analysed them with newer and more robust 
set of algorithms (23,27,28). Three algorithms were used 
to allow for the inspection of the differences between each 
algorithm's results (Figure 3). CircaDB is actively 
maintained and will continue to add new features and 
data sets as time they become available. Requests for 
integration of data sets are handled via submitting a 
request via the project site at Github. CiraDB also 
provides integration expression profiles for use within 
BioGPS (33). 

Finally, to facilitate use of this database framework 
by other researcher groups, we have made the source 
code for the application freely available under the 
GPL 2.0 open source license. The project has been 
recently used to visualize circadian experiments for 
Anopheles gambiae (34). All of these together make 
CircaDB a unique and valuable resource for the circadian 
research community. 
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GeneChip 



The record for an 
^Affymetrix array chipy 



Probeset 



A record for a probeset along 
with the transcript annotation 



I 



ProbesetData 



The raw data points for a given probeset 
for a given experiment, along with a 
^pre-computed URI for chart generation 



Assay 



A gene expression study 




ProbesetStat 



The statistical values from the circadian algorithms 
for a given probeset from a given experiment 



Legend 

One-to-many relationship 
One-to-one relationship 



Figure 1. The database schema. Boxes represent table, and edges represent foreign key relationships. Further documentation is available at http:// 
github.com/itmat/circadb. 



(a) 



Enter a search 

(Examples 1451371_at 1422470_at gnfl m00037_a_at or Arntl or kinase inhibitor | 



Query syntax mode: • simple advanced © 



Choose probability filter JTK P-value 



a 



Probabily cut-off value 0.05 J ("0.5" or "2e-1G") 
JTK phase range 1 0-40 | ( 0-6" or "10.30-1 1 .10") 

Select one or more experiments to view © 

(If nothing is selected, CIRCA will search all experiments.) 

Mouse Liver 48 hour Hughes 2009 (Affymetrix) 
Mouse Pituitary 48 hour Hr 
Mouse NIH 3T3 Immortili; (b) 
Human U2 OS Hughes 20 



Search 



Probabily cut-off value 



Choose which experimc 

Mouse Wild Type SCN (Gr 
Mouse Liver Panda 2002 (Affymetrix) 



1 /JTK P-value 


J 


JTKQ-value 




Lomb Scargle P-value 
Lomb Scargle Q-value 
E DeLichtenberg P-value 
|i DeLichtenberg Q-value 





Figure 2. (a) The query interface for CircaDB. The interface consists of a simple and powerful full-text search capability, with possible restrictions 
on the data sets, phase information and a significance threshold for a given algorithm, (b) The set of available threshold categories for the circadian 
classification algorithms. 
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Nucleic Acids Research 



Tissue liver 



5264 



21 SB 



1 10 




UCSC RNAseq 

RNAseq_NM_007489 

Probeset 

1425099_a_at 

Links Wikipedia 
HomoloGene 

Symbol Arntl 

Unigene Mm. 440371 

RefSeq Protein 

NP_031515 

RefSeq DNA 

NM 007489 



Description aryl 
hydrocarbon 
receptor nuclear 
translocator-like 



© 


p-Value 


q-Value 


period 


phase 


JTK 


3.0e-16 


2.71e-13 


24.5 


4.5 


Lomb Scargle 


2.58e-08 


9.13045e-05 


24.383 


29.97 


DeLichtenberg 


0.0 


0.0 


24.0 


NA 



back to the top 



Figure 3. Expression profile report. A simple visualization of the data accompanies the main annotation of the gene probe, probability values from 
various circadian rhythm detection algorithms and other circadian information. 
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